On the Smoothness of Linear Value Function Approximations

نویسندگان

  • Branislav Kveton
  • Milos Hauskrecht
چکیده

Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a set of basis functions and optimize their weights by linear programming. It is known that the solution to this convex optimization problem minimizes the L1-norm distance in between the optimal value function and its approximation. In this paper, we relate this measure to the max-norm error of the same value function. We believe that this theoretical analysis may help to understand the quality of HALP approximations in continuous domains. Introduction Markov decision processes (MDPs) (Bellman 1957; Puterman 1994) provide an elegant mathematical framework for solving sequential decision problems in the presence of uncertainty. However, traditional techniques for solving MDPs are computationally infeasible in real-world domains, which are factored and represented by both discrete and continuous state and action variables. Approximate linear programming (ALP) (Schweitzer & Seidmann 1985) has recently emerged as a promising approach to address these challenges (Kveton & Hauskrecht 2006). Our paper centers around hybrid ALP (HALP) (Guestrin, Hauskrecht, & Kveton 2004), which is an established framework for solving large factored MDPs with discrete and continuous state and action variables. The main idea of the approach is to approximate the optimal value function by a linear combination of basis functions and optimize it by linear programming (LP). The combination of factored reward and transition models with the linear value function approximation permits the scalability of the approach. The quality of HALP solutions inherently depends on the choice of basis functions. Therefore, it is often assumed that these are provided as a part of the problem definition, which is unrealistic. The goal of this paper is to analyze the quality of HALP approximations. Based on the analysis, we provide a simple advice for selecting basis functions. Hybrid factored MDPs Discrete-state factored MDPs (Boutilier, Dearden, & Goldszmidt 1995) permit a compact representation of stochastic decision problems by exploiting their structure. In this work, we consider hybrid factored MDPs with exponential-family transition models (Kveton & Hauskrecht 2006). This model extends discrete-state factored MDPs to the domains of discrete and continuous state and action variables. A hybrid factored MDP with an exponential-family transition model (HMDP) (Kveton & Hauskrecht 2006) is given by a 4-tuple M = (X,A, P,R), where X = {X1, . . . ,Xn} is a state space characterized by a set of discrete and continuous variables, A = {A1, . . . , Am} is an action space represented by action variables, P (X | X,A) is an exponentialfamily transition model of state dynamics conditioned on the preceding state and action choice, and R is a reward model assigning immediate payoffs to state-action configurations.1 In the remainder of the paper, we assume that the quality of a policy is measured by the infinite horizon discounted reward E[ ∑∞ t=0 γ rt], where γ ∈ [0, 1) is a discount factor and rt is the reward obtained at the time step t.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New High-order Takagi-Sugeno Fuzzy Model Based on Deformed Linear Models

Amongst possible choices for identifying complicated processes for prediction, simulation, and approximation applications, high-order Takagi-Sugeno (TS) fuzzy models are fitting tools. Although they can construct models with rather high complexity, they are not as interpretable as first-order TS fuzzy models. In this paper, we first propose to use Deformed Linear Models (DLMs) in consequence pa...

متن کامل

بازنگری ‌برآورد سری‌زمانی‌جمعیت‌شاغل به تفکیک بخش‌های اقتصادی ایران (1335-1385)

This paper tries to estimate employees population in time series in Irans economic sectors within 1955-2006. Natural spline and monotone cubic spline interpolation methods were used in order to estimate the time series statistics of the employees population between two censuses or two consecutive samplings. An explanatory variable was chosen according to labor demand and labor market structure....

متن کامل

Adaptive Approximation of Functions with Discontinuities

One of the basic principles of Approximation Theory is that the quality of approximations increase with the smoothness of the function to be approximated. Functions that are smooth in certain subdomains will have good approximations in those subdomains, and these sub-approximations can possibly be calculated efficiently in parallel, as long as the subdomains do not overlap. This paper proposes ...

متن کامل

Optimally Sparse Approximations of 3D Functions by Compactly Supported Shearlet Frames

Abstract. We study efficient and reliable methods of capturing and sparsely representing anisotropic structures in 3D data. As a model class for multidimensional data with anisotropic features, we introduce generalized three-dimensional cartoon-like images. This function class will have two smoothness parameters: one parameter β controlling classical smoothness and one parameter α controlling a...

متن کامل

N-Widths and ε-Dimensions for High-Dimensional Approximations

In this paper, we study linear trigonometric hyperbolic cross approximations, Kolmogorov n-widths dn(W,H γ), and ε-dimensions nε(W,H γ) of periodic d-variate function classes W with anisotropic smoothness, where d may be large. We are interested in finding the accurate dependence of dn(W,H γ) and nε(W,H γ) as a function of two variables n, d and ε, d, respectively. Recall that n, the dimension ...

متن کامل

Nonlinear Approximation Using Gaussian Kernels

It is well-known that non-linear approximation has an advantage over linear schemes in the sense that it provides comparable approximation rates to those of the linear schemes, but to a larger class of approximands. This was established for spline approximations and for wavelet approximations, and more recently for homogeneous radial basis function (surface spline) approximations. However, no s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006